Goto

Collaborating Authors

 Benton County


Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense

Mukherjee, Sayak, Chatterjee, Samrat, Purvine, Emilie, Fujimoto, Ted, Emerson, Tegan

arXiv.org Artificial Intelligence

Designing rewards for autonomous cyber attack and defense learning agents in a complex, dynamic environment is a challenging task for subject matter experts. We propose a large language model (LLM)-based reward design approach to generate autonomous cyber defense policies in a deep reinforcement learning (DRL)-driven experimental simulation environment. Multiple attack and defense agent personas were crafted, reflecting heterogeneity in agent actions, to generate LLM-guided reward designs where the LLM was first provided with contextual cyber simulation environment information. These reward structures were then utilized within a DRL-driven attack-defense simulation environment to learn an ensemble of cyber defense policies. Our results suggest that LLM-guided reward designs can lead to effective defense strategies against diverse adversarial behaviors.


On the Stochastic Stability of Deep Markov Models

Neural Information Processing Systems

This section proposes additional regularization methods for learning stable deep Markov models. The most direct approach is to include the stability conditions as extra penalties in the DMM loss function.



Self-adaptive weighting and sampling for physics-informed neural networks

Chen, Wenqian, Howard, Amanda, Stinis, Panos

arXiv.org Machine Learning

Physics-informed deep learning has emerged as a promising framework for solving partial differential equations (PDEs). Nevertheless, training these models on complex problems remains challenging, often leading to limited accuracy and efficiency. In this work, we introduce a hybrid adaptive sampling and weighting method to enhance the performance of physics-informed neural networks (PINNs). The adaptive sampling component identifies training points in regions where the solution exhibits rapid variation, while the adaptive weighting component balances the convergence rate across training points. Numerical experiments show that applying only adaptive sampling or only adaptive weighting is insufficient to consistently achieve accurate predictions, particularly when training points are scarce. Since each method emphasizes different aspects of the solution, their effectiveness is problem dependent. By combining both strategies, the proposed framework consistently improves prediction accuracy and training efficiency, offering a more robust approach for solving PDEs with PINNs.


Policy Gradient-Based EMT-in-the-Loop Learning to Mitigate Sub-Synchronous Control Interactions

Mukherjee, Sayak, Hossain, Ramij R., Chatterjee, Kaustav, Nekkalapu, Sameer, Elizondo, Marcelo

arXiv.org Artificial Intelligence

This paper explores the development of learning-based tunable control gains using EMT-in-the-loop simulation framework (e.g., PSCAD interfaced with Python-based learning modules) to address critical sub-synchronous oscillations. Since sub-synchronous control interactions (SSCI) arise from the mis-tuning of control gains under specific grid configurations, effective mitigation strategies require adaptive re-tuning of these gains. Such adaptiveness can be achieved by employing a closed-loop, learning-based framework that considers the grid conditions responsible for such sub-synchronous oscillations. This paper addresses this need by adopting methodologies inspired by Markov decision process (MDP) based reinforcement learning (RL), with a particular emphasis on simpler deep policy gradient methods with additional SSCI-specific signal processing modules such as down-sampling, bandpass filtering, and oscillation energy dependent reward computations. Our experimentation in a real-world event setting demonstrates that the deep policy gradient based trained policy can adaptively compute gain settings in response to varying grid conditions and optimally suppress control interaction-induced oscillations.


ORCHID: Orchestrated Retrieval-Augmented Classification with Human-in-the-Loop Intelligent Decision-Making for High-Risk Property

Mahbub, Maria, Lama, Vanessa, Das, Sanjay, Starks, Brian, Polchek, Christopher, Silvers, Saffell, Deck, Lauren, Balaprakash, Prasanna, Ghosal, Tirthankar

arXiv.org Artificial Intelligence

High-Risk Property (HRP) classification is critical at U.S. Department of Energy (DOE) sites, where inventories include sensitive and often dual-use equipment. Compliance must track evolving rules designated by various export control policies to make transparent and auditable decisions. Traditional expert-only workflows are time-consuming, backlog-prone, and struggle to keep pace with shifting regulatory boundaries. We demo ORCHID, a modular agentic system for HRP classification that pairs retrieval-augmented generation (RAG) with human oversight to produce policy-based outputs that can be audited. Small cooperating agents, retrieval, description refiner, classifier, validator, and feedback logger, coordinate via agent-to-agent messaging and invoke tools through the Model Context Protocol (MCP) for model-agnostic on-premise operation. The interface follows an Item to Evidence to Decision loop with step-by-step reasoning, on-policy citations, and append-only audit bundles (run-cards, prompts, evidence). In preliminary tests on real HRP cases, ORCHID improves accuracy and traceability over a non-agentic baseline while deferring uncertain items to Subject Matter Experts (SMEs). The demonstration shows single item submission, grounded citations, SME feedback capture, and exportable audit artifacts, illustrating a practical path to trustworthy LLM assistance in sensitive DOE compliance workflows.


Rethinking deep learning: linear regression remains a key benchmark in predicting terrestrial water storage

Nie, Wanshu, Kumar, Sujay V., Chen, Junyu, Zhao, Long, Skulovich, Olya, Yoo, Jinwoong, Pflug, Justin, Ahmad, Shahryar Khalique, Konapala, Goutam

arXiv.org Artificial Intelligence

Key Points: We compare linear regression, LSTM, and Transformer models for predicting terrestrial water storage at basin scale over the globe. Linear regression remains a robust benchmark, outperforming LSTM and Transformer models in various tasks. Traditional statistical models and global datasets that capture human and natural impacts are essential for deep learning model evaluation. 2 Abstract Recent advances in machine learning such as Long Short - Term Memory (LSTM) models and Transformers have been widely adopted in hydrological applications, demonstrating impressive performance amongst deep learning models and outperforming physical models in various tasks. However, their superiority in predicting land surface states such as terrestrial water storage (TWS) that are dominated by many factors such as natural variability and human driven modifications remains unclear. Here, using the open - access, globally representative HydroGlobe dataset - comprising a baseline version derived solely from a land surface model simulation and an advanced version incorporating multi - source remote sensing data assimilation - we show that linear regres sion is a robust benchmark, outperforming the more complex LSTM and Temporal Fusion Transformer for TWS prediction. Our findings highlight the importance of including traditional statistical models as benchmarks when developing and evaluating deep learning models. Additionally, we emphasize the critical need to establish globally representative benchmark datasets that capture the combined impact of natural variability and human interventions. Plain Language Summary Recent progress in machine learning has led to the widespread use of deep learning models in studying land freshwater systems, but it remains uncertain if they're always the best tools for such applications . In this study, we use a new, global dataset called HydroGlobe to test different data - driven models. Surprisingly, we find that a basic linear regression model -- one of the simplest tools -- actually performs better than more complex models like LSTM and Transformers in predicting land water storage. Our resu lts suggest that researchers should always compare deep learning models against simpler traditional statistical benchmarks, and that having high - quality, global datasets that include both natural and human effects is crucial for building better deep learning models. 1 Introduction Terrestrial water storage (TWS) is a key indicator of the world's freshwater availability, encompassing all forms of water stored on and beneath the land surface, including soil moisture, groundwater, surface water, and snow. As a fundamental component of the global hydrological cycle, accurate TWS estimates are essential for applications related to preserving ecosystems, supporting agriculture, and ensuring water and food security for livelihoods.


On the Stochastic Stability of Deep Markov Models

Neural Information Processing Systems

This section proposes additional regularization methods for learning stable deep Markov models. The most direct approach is to include the stability conditions as extra penalties in the DMM loss function.



Automatic Building Code Review: A Case Study

Wan, Hanlong, Xu, Weili, Rosenberg, Michael, Zhang, Jian, Siddika, Aysha

arXiv.org Artificial Intelligence

Building officials, especially those in resource - constrained or rural jurisdictions, struggle with labor - intensive, error - prone, and costly manual reviews of design documents as projects scale in size and complexity. Widespread adoption of Building Information Modeling (BIM) and Large Language Models (LLMs) has created opportunities for automated code review (AC R) solutions . This study proposes a novel agent - driven framework that integrates BIM - based data extraction with automated verification using both re trieval - augmented generation (RAG) and Model Context Protocol (MCP) agent pipelines. The framework employs LLM - enabled agents to extract geometry, schedules, and system attributes from heterogeneous file types, which are then processed for building code checking via two complementary mechanisms: (i) direct API calls to DOE's COMcheck engine, providing deterministic and audit - ready outputs, and (ii) RAG - based reasoning over rule provisions, allowing flexible interpretation where coverage is incomplete or amb iguous . The framework was evaluated through case demonstrations, including automated extraction of geometric attributes (e.g., surface area, tilt, and insulation values), parsing of operational schedules, and design validation for lighting allowances under ASHRAE Standard 90.1 - 2022. Comparative performance tests across multiple large language models showed that Generative Pre - trained Transformer 4 Omni (GPT - 4o) achieved the best balance of efficiency and stability, while smaller models exhibited inconsistenc ies or failure s . Results confirm that MCP agent pipelines perform better than RAG reasoning pipelines on rigor and flexibility in workflows.